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In precision farming, identifying weeds is an essential first step in planning 
an integrated pest management program in cereals. By knowing the species 
present, we can learn about the types of herbicides to use to control them, 
especially in non-weeding crops where mechanical methods that are not 
effective (tillage, hand weeding, and hoeing and mowing). Therefore, using 
the deep learning based on convolutional neural network (CNN) will help to 
automatically identify weeds and then an intelligent system comes to 
achieve a localized spraying of the herbicides avoiding their large-scale use, 
preserving the environment. In this article we propose a smart system based 
on object detection models, implemented on a Raspberry, seek to identify 
the presence of relevant objects (weeds) in an area (wheat crop) in real time 
and classify those objects for decision support including spot spray with a 
chosen herbicide in accordance to the weed detected. 
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1. INTRODUCTION 

Cereal crops play an important economic and social role in Morocco, as they constitute between 10 
and 20 percent of the gross domestic product (GDP). It represents about 40 percent of the nutrition budget of 
Moroccan families, and it covers about 25 percent of the total needs of the herd [1]. Weeds compete with 
cereals for the use of water and nutrients. Some weeds can also serve as secondary hosts for various pathogens, 
nematodes or pests [2]. Agricultural technologies is a chance that can advance great governance of the 
agricultural policy, it can give answers for these issues and lift the financial development of our nation [3]. 

Weed detection remains a difficult problem due to variation in plant appearance, changes in lighting, 
foliage occlusions and different stages of growth under field conditions. Monocotyledon and dicotyledon are 
the two groups into which all the flowering plants or angiosperms were formerly divided [4]. Cereals and 
especially wheat are monocotyledon plants, therefore in wheat fields, dicotyledon weeds will be controlled 
easily and more effectively by special herbicides for dicotyledon plants but monocotyledon weeds (same type 
as wheat) require a selective herbicide specific for this weed to eliminate them by targeting them and 
spraying them [5]. Current approaches for weed and crop recognition, segmentation, and detection rely 
primarily on conventional machine learning techniques, which require a large number of hand-designed 
features for modelling [6]. Several studies have been completed on this topic and many weed detection 
algorithms are developed to lead the techniques to obtain high-precision weed removal. These algorithms 
contributed to the emergence of many robotic weed control systems that focuses principally on single tactics 
[7]. Among the existing systems, we can cite electrical discharging [8], chemical spraying [9], flaming, and 
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mechanical weeding [10]. These systems, despite their development, still suffer from some drawbacks, the 
most important of which are: consuming a lot of time, pollution, harming the crop. 

Intelligent weed removal machines depend on the performance of the machine vision system to 
detect weeds, these systems mainly depend on deep learning, that enables the classification and location of 
weeds, the most used technique for this goal is convolutional neural network (CNN). This article discusses a 
CNN model implemented on an intelligent system to detect and locate types of weeds known in wheat crops 
in the region of Beni Mellal-Khenifra, Morocco. This smart system will help farmers to identify the species 
of weeds earlier, and classified it into monocots and dicots, to achieve a localized spraying to eliminate them 
and allow crops to grow freely benefiting from mineral salts and water. all that for the purpose of preserving 
the environment and reducing the cost of the operation. 


2. RESEARCH METHOD 

In our experiment we used a professional Nikon 7000 camera, to collect 1318 real images of wheat 
fields, to train a CNN model, under different lighting conditions (from morning to afternoon in sunny and 
cloudy weather). We have chosen two types of weed (monocotyledon and Dicotyledon) the most known in 
our region of Beni Mellal-Khenifra Figures 1 and 2. We have added the technical options to the images. The 
technical options that add random variations to generate more training data and improve the performance of 
the model [11]. The model used is the you only look once (YOLO) algorithm which is an efficient choice 
when we need real-time detection [12], without loss of too much accuracy and able to predict class labels and 
detects locations of weeds in order to take the better decision Table 1 [13]. Due to its high-speed inference, 
this model has the potential to be deployed on a single-board platform like the Raspberry Pi for weed 
detection and control it in real time through a set of features included in it. (Camera, Wifi...) [14]. 


Figure 1. Monocotylodone weed (Convolvulus) Figure 2. Dicotyledonous weed (Phalaris) 


Table 1. Decision-making for controling weeds 


Crop affiliate group Weed identified Decision to make 
Dicots Monocotylodone Spray a monocotylodone herbicide 
Dicotyledon (same type of culture; e,g convolovus) Spray selective herbicide specifically targeting 
this dicotyledon weed. 
Monocots Monocotylodone (same type of culture; e,g phalaris in Spray selective herbicide specifically targeting 
our case) this Monocotylodone weed. 
Dicotyledon Spray dicotyledon herbicide 


2.1. Object detection method 

Object detection is one of the most popular computer vision models due to its versatility [15], it 
involves predicting the presence of one or more objects, along with their classes and bounding boxes [16]. 
YOLO is a state of art object detector which can perform object detection in real-time with good accuracy, 
YOLOVS5S is a new release of the YOLO family of models that appeared on June 25, 2020. Used as one of the 
fastest algorithm that uses CNN for object detection combined bounding box prediction and object 
classification into a single end-to-end differentiable network, it can classify the image into a category, and 
can detect multiple objects within an image [17]. This algorithm applies a single neural network to the full 
image. It means that this network divides the image into regions and predicts bounding boxes and 
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probabilities for each region directly from full images in one evaluation. These bounding boxes are weighted 
by the predicted probabilities; YOLO is a faster and an accurate algorithm that is suitable for our operations. 
Therefore, we try to training the YOLO model and implement it on Raspberry Pi to guarantee a real time 
detection and identification, because this technological innovation opens up new possibilities for embedded 
internet of things (IoT) applications in fields such as the automatic weeding system that allows localized 
spraying with a chosen herbicide and with advanced analysis capability [18]. 


2.2. Loading and data preparation 

As with any deep learning task, the first most important one is to prepare the dataset. Dataset is 
considered as the fuel, which runs any deep learning model [19]. The dataset for our experiment contains 
1318 images of two different weed species well known in this region strained in wheat fields in this region 
known with its temperate climate. The first is the Phalaris paradoxa, the second is Convolvulus Table 2, these 
images are belonging to two different classes, and each class contains RGB images that show plants at 
different growth stages. The images are in various sizes and are in png format consequently, images had to be 
resized before being used as input to the CNN model to the shape 416x416 pixels [20]. 


Table 2. A list of the weeds classes available for the study 


Scientific name English name Moroccan name Affiliation group Number of images 
Convolvulus arvensis L Convolvulus Lwaya Dicotyledon 904 
Phalaris paradoxa Phalaris Zouane Monocotolydone 401 


The first thing we did is annotating those images which is a key technique used to create training 
data for computer vision. In order that the model perceives objects in their surroundings and automatically 
assign it a caption, annotated images are needed to train the model to learn to see an area full of objects as we 
do [21]. We used for that, bounding box technique that requires labellers to draw a box as close as possible to 
the edges of key objects within the image Figure 3, and stored the top-left and bottom-right points in the 
corresponding txt file respecting the syntax of class_id x y width height [22]. 


Figure 3. Bounding boxes example used 


As the Figure 4 shows, there are three necessary sets of images to train the model, so we divided our 
images into the train, validation, and test splits to prevent the network model from overfitting issues and 
evaluate the model accurately [23]. The training set is the extensive part of our dataset (70% of our dataset) 
reserved for training the model. Inference resulted in these images after the training step will be taken to 
memorize the correct output. The validation set is a separate section of the dataset (20% of our dataset) that is 
used during training to evaluate the model performance reporting the validation metrics continually after each 
training epoch such as validation mean average precision (mAP), or validation loss. The test set (10% of our 
dataset) is used after the training experiments to get an idea of the final performance of the model. 
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Original labeled data 


Training data Validation set 


Figure 4. The three basic sets for the model training 


There are other preprocessing options, which are necessary steps to ensure that all our images are 
correctly formatted for the model and help it to detect edges better. This processing applies to all (train, valid, 
and test) images to reduce training time and improve inference speed [24], in our case we used a set of 
configuration as in the following examples: i) crop: crop each image to the specified section, such as the 
bottom third; ii) contrast: boosts contrast based on the image's histogram to improve normalization and line 
detection in varying lighting conditions; and iii) tile: split images into tiles to increase accuracy on small 
objects. 

The last phase of data preparation is data-augmentation, which is a method applied to images in our 
training set to produce new and different training samples. This way applies domain-specific techniques to 
examples to generate more training data, and improve the model performance [25], below examples of some 
methods used in the experiment that gives us more than 3000 images after this operation Figure 5: i) 
exposure: this technique adds a degree of variability to image brightness to help the system be more resilient 
to lighting and camera setting changes; ii) flip: method that adds a vertical or horizontal flip to help the 
model be insensitive to subject orientation; iii) crop: method adds variability to positioning and size, helping 
the model be more resilient to subject translations and camera position. 


o x n 


Generated image with horizontal flip “Produced image with crop method 


Figure 5. Set of data-augmentation methods applied 


2.3. Deep neural network architecture 

During our experiments, we used a YOLO model with some interventions on it. Thus, we reduced 
the number of detection scales to two scales, this model had in the scaling multipliers of the width 0.50 and 
depth 0.33 [26]. It is composed of a backbone part where our original image 416*416*3 is input into the 
Focus structure. The slicing operation is used first to become a 208*208*12 feature map, and then after a 
convolution operation of 32 convolution kernels the last change a feature map of 208*208*32. Also, a cross- 
stage-partial (CSP) structure is designed in the backbone for feature extraction [27]. 
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Furthermore, a neck part uses feature pyramid network (FPN) and path aggregation network (PAN) 
to make object detection predictions at different scale levels [28]. Finally, the model uses GIOU_Loss as the 
loss function of the bounding box in output [29]. Regarding the various parameters of different layers 
(283 layers) they have the same structure as the small YOLOv5 as it is mentioned in the following schema 
Figures 6 and 7. 


Input terminal: Backbone : 


Mosaic data Focus structure, FPN+PAN 
enhancement CSP structure structure 


Prediction : 
GIOU_Loss 


Figure 6. Deep neural network architecture 


depth_multiple: 0.33 # model depth multiple 
width_multiple: 0.50 # layer channel multiple x m head 
ead: 
# h [[-1, 1, Conv, [512, 1, 1]], 
ea [-1, 1, nn.Upsample, [None, 2, ‘'nearest']], 
_ À [[-1, 6], 1, Concat, [1]], # cat backbone P4 
[19:13 10550,,.13,25), Œ Bays [-1, 3, BottleneckCSP, [512, False]], # 13 
- [30,61, 62,45, 59,119] # P4/16 
- [116,90, 156,198, 373,326] # P5/32 [-1, 1, Conv, [256, 1, 1]] 
? 2 2 2 2 > 
[-1, 1, nn.Upsample, [None, 2, ‘nearest']], 
= Lee Manes [[-1, 4], 1, Concat, [1]], # cat backbone P3 
backbone: [-1, 3, BottleneckCsP, [256, False]], # 17 (P3/8-small) 
# [from, number, module, args] 
[[-1, io Focus, [64, 3]], # Q@-P1/2 PL 4; Conv, [256, 3, FTT, 
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4 [[-1, 14], 1, Concat, [1]], # cat head P4 
-1, 3, BottleneckCsP, [128]], [-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium) 
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8 
[-1, 9, BottleneckCsP, [256]], [-1, 1, Conv, [512, 3, 2]], 
-1, 1, Conv, [512, 3, 2]], # 5-P4/16 [[-1, 10], 1, Concat, [1]], # cat head P5 
[-1, 9, BottleneckCSP, [512]], [-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large) 
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32 
[-1, 1, SPP, [1024, [5, 9, 13]]], [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) 
[-1, 3, BottleneckCSP, [1024, False]], #9 J 
] 


Figure 7. Model configuration 


3. RESULT AND DISCUSSION 

With prepared custom data, we trained the model configured above for 200 epochs which are 
hyperparameters of gradient descent that control the number of complete passes through the training dataset, 
that generate 7255094 parameters through 283 layers Figure 8. For evaluating the performance of the model, 
we employed several parameters and mean average precision (mAP) as one of the popular metrics in 
measuring the accuracy of object detectors defined with the formula [30]: 


MAP = iia AveP(q) 


Q : The number of queries in the set. 
AveP(q) : The average precision (AP) for a given query, q. 

To fully evaluate the performance of the model, we need to analyze both precision and recall. The 
precision presents a proportion of positive identifications that were actually identified as correct (1), the 
recall is the measure of the model correctly identifying true positives (TP) (2). Precision and recall are 
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calculated using true positives (TP), false positives (FP) and false negatives (FN) [31], calculating those 
metrics for all the objects presented in the images helps to get mAP. Also getting the result on test set once a 
model is completely trained to help to check the performance of the model [32]. 


. TP 
Precision = (1) 
TP+FP 
TP 
Recall = (2) 
TP+FN 

3 a a 73984 models.common.Conv 64, 128, 3, 2 

4 -1 1 161152 models.common.BottleneckCSP 128, 128, 3] 

5 -1 1 295424 models.common.Conv 128, 256, 3, 2 

6 -1 1 641792 models.common.BottleneckCSP 256, 256, 3] 

7 -1 1 1180672 models.common.Conv 256, 512, 3, 2 

8 -1 1 656896 models.common.SPP 512, 512, [5, 9, 13]] 

9 -1 1 1248768 models.common.BottleneckCSP 512, 512, 1, False] 

10 -1 1 131584 models.common.Conv S125 2505.25.18 

11 -1 1 @ torch.nn.modules.upsampling.Upsample None, 2, 'nearest'] 

12 [-1, 6] 1 @ models.common.Concat 1) 

13 i 2 378624 models.common.BottleneckCSP 512, 256, 1, False] 

14 1 33024 models.common.Conv 256, 128, 1, 1 

15 ze © torch.nn.modules.upsampling.Upsample None, 2, 'nearest'] 

16 {-1, 4] 1 @ models.common.Concat 1) 

17 -1 1 95104 models.common.BottleneckCsP (256, 128, 1, False] 

18 = a 147712 models.common.Conv 128, 128, 3, 2 

19 [-1, 14] 1 ® 1s .common.Concat 1] 

20 -1 1 313088 models.common.BottleneckCsP 256, 256, 1, False] 

21 -1 1 590336 models.common.Conv 256, 256, 3, 2 

22 [-1, 10] 1 @ models.common.Concat 1) 

23 -1 1 1248768 models.common.BottleneckCSP 512, 512, 1, False] 

24 (17, 20, 23] 1 16182 models.yolo.Detect 1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 
Model Summary: 283 layers, 7255094 parameters, 7255094 gradients 


Figure 8. Model summary 


To display the results of the training, we used the visualization tool called TensorBoard. It allows us 
to visualize our TensorFlow graphs, to plot quantitative metrics on the execution of our graphs. The next 
section illustrates the results obtained in terms of precision recall and mean average precision. 

As shown in the results, we tried 200 epochs, and we got good outcomes, the accuracy of the 
learning increases with the number of periods; this reflects that the model learns more information in each 
period. The model has arrived at 83% accuracy as shown in Figure 9, which means that the model achieves 
high precision. Moreover, the recall reaches 93% means that the model is efficient in identifying the relevant 
data as shown in Figure 10, this training returns a mAP of 94.4% at 0.5 shown in Figure 11. In Figure 12, the 
old school graphs also showed the same result in terms of recall, precision and mAP. The label predictions on 
the testing set images also show precise detection shown in Figure 13. Therefore, these experiments have 
demonstrated good achievement in training the CNN model to address weed classification in real-time. 

Although we can be satisfied with the results to evaluate the model, we showed other the same 
metrics on some old school graphs plotted by python as shown in Figure 11. That showed the same results 
that we obtained on Tansorboard. In the Figure 12, we show model inference on the test images, where this 
evaluation is also commonly used to see the performance of the model. 
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Figure 9. Model precision Figure 10. Model recall 
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Figure 11. The mAP measured at 0.5 


from utils.utils import plot_results; plot_results() #transform the results obtained in training.png 
Image(filename='./training.png') #view training.png 
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Figure 13. Label predictions on test images 
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The main result of this study demonstrate the efficacy of YOLO and its high ability to identify 
weeds in a short and real time, the obtained accuracy above is very interesting, show that this model is 
accurate and fast. Which makes us adopted it in an intelligent automated real-time weed detection system in 
wheat crops to achieve localized spraying of these weeds with the correct herbicide instead of spraying the 
entire surface. The idea consists of deploying this deep learning model trained and saved in .h5 format on a 
Raspberry equipped with a camera which scan the field in motion to detect weeds in real-time [33]. The 
Raspberry sends a start order to a 12 V pump [34], which works alternately with other pumps, depends on the 
species of the weed detected. This system is also configured in a way that allows it to be controlled remotely 
from another computer or mobile device [35]. The Figure 14 clearly shows the synoptic of this spraying 
system. 


See Raspberry 


ri) Fk gk 
ddd 


Spraying 


Figure 14. Overview of the spraying system 


Where: 
T1, T2, and T3 : Tanks of the herbicide A, B, C... depends on existing weed species in the field. 
P1, P2, P3 : Pumps of the herbicide A, B, C. 


The added value of the system we described above is that it can be considered as a smart system that 
will contribute to preserving the environment. Also the economical use of the herbicide, Quickly identify the 
type of weeds and determine the type of the appropriate herbicide in real-time, effectiveness in spraying the 
affected areas, it will produce a set of data that can be used in tracking and decision-making [36], it can be 
controlled remotely. In the end, we can also expand the list of unwanted weeds by reprogramming it after it 
becomes clear to us that this system is effective in the desired goal. We conclude with a brief comparison of 
our proposed weed control system with the existing systems mentioned in the literature review in the Table 3. 


Table 2. The comparison of the proposed system with other existing methods 


Criterion Proposed weed control system Existing systems [37]-[39] 
Time Real-time detection and control (fast) Consume much time 
Cost Economical use of the herbicide Economical to establish Too much herbicide is consumed 
Control Controlled remotely Mechanical control 
Data Produce a set of data No data generated 
Environment Only spray the affected areas (Environmental protection) Large scale spraying (affect the environment and crops) 
Technology Simple Sometimes complex 
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4. CONCLUSION 

We planned to work with a fast deep learning algorithm that allows us to identify weeds in real-time 
in a cultivated area and implemented an smart weeding system. In our experimentation, we implemented an 
open-source object detection model based on the latest version of YOLO, and we saw how it performs with 
the weed detection problem. The results showed better accuracy, this who encouraged us to adopt this model 
and implement it in a Raspberry based system which allows an automatic spray decision to be made. Instead 
of spraying one type of herbicide all over the plot, the system chooses the right herbicide depending on the 
weed detected by spraying only the areas infected with weeds, and this makes weeding more effective and 
saves herbicide and protects the environment. In the future, further steps may be taken to improve the 
performance of the system adjusting other parameters to be as a powerful real-time weeding system. 
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